A mixture of Gaussians front end for speech recognition
نویسندگان
چکیده
This paper describes a feature extraction technique based on fitting a Gaussian mixture model (GMM) to the speech spectral envelope. The features obtained (the component means, variances and priors) represent both the the general shape of the spectrum and provide information on the position of the spectral peaks. As the features select peaks in the spectrum they are related to the formant amplitudes, locations and bandwidths. Results using the Resource Management corpus, a medium vocabulary task are presented. Although by themselves the GMM features do not outperform MFCC features, systems combining the GMM systems with a standard frontend are shown to give a reduction in word error rate.
منابع مشابه
A comparative study of Gaussian selection methods in large vocabulary continuous speech recognition
Gaussian mixture models are the most popular probability density used in automatic speech recognition. During decoding, often many Gaussians are evaluated. Only a small number of Gaussians contributes significantly to probability. Several promising methods to select relevant Gaussians are known. These methods have different properties in terms of required memory, overhead and quality of selecte...
متن کاملN-best based stochastic mapping on stereo HMM for noise robust speech recognition
In this paper we present an extension of our previously proposed feature space stereo-based stochastic mapping (SSM). As distinct from an auxiliary stereo Gaussian mixture model in the front-end in our previous work, a stereo HMM model in the back-end is used. The basic idea, as in feature space SSM, is to form a joint space of the clean and noisy features, but to train a Gaussian mixture HMM i...
متن کاملA Comparative Study of Gauss in Large Vocabulary Continuou
Gaussian mixture models are the most popular probability density used in automatic speech recognition. During decoding, often many Gaussians are evaluated. Only a small number of Gaussians contributes significantly to probability. Several promising methods to select relevant Gaussians are known. These methods have different properties in terms of required memory, overhead and quality of selecte...
متن کاملReduced gaussian mixture models in a large vocabulary continuous speech recognizer
Large vocabulary continuous speech recognition (LVCSR) systems usually employ several tens of thousands of gaussian mixture components for an accurate statistical representation of naturally spoken human speech. For applications that cannot e ort the computational expensive evaluation of numerous Gaussians during recognition time, it is an important question whether the number of Gaussians can ...
متن کاملThe bucket box intersection (BBI) algorithm for fast approximative evaluation of diagonal mixture Gaussians
Today, most of the state-of-the-art speech recognizers are based on Hidden Markov modeling. Using semi-continuous or continuous density Hidden Markov Models, the computation of emission probabilities requires the evaluation of mixture Gaussian probability density functions. Since it is very expensive to evaluate all the Gaussians of the mixture density codebook, many recognizers only compute th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001